Elementary Symbol Tables

§ Symbol table.

The primary purpose of a symbol table is to associate a value with a key. The client can insert key–value pairs into the symbol table with the expectation of later being able to search for the value associated with a given key.

Typical symbol-table applications

§ API.

Here is the API.

Symbol-table API

We consider several design choices for our implementations to make our code consistent, compact, and useful.
  • Generics. We consider the methods without specifying the types of keys and values being processed, using generics.
  • Duplicate keys. Only one value is associated with each key (no duplicate keys in a table). When a client puts a key-value pair into a table already containing that key (and an associated value), the new value replaces the old one. These conventions define the associative array abstraction, where you can think of a symbol table as being just like an array, where keys are indices and values are array entries.
  • Null values. No key can be associated with the value null. This convention is directly tied to our specification in the API that get() should return null for keys not in the table. This convention has two (intended) consequences: First, we can test whether or not the symbol table defines a value associated with a given key by testing whether get() returns null. Second, we can use the operation of calling put() with null as its second (value) argument to implement deletion.
  • Deletion. Deletion in symbol tables generally involves one of two strategies: lazy deletion, where we associate keys in the table with null, then perhaps remove all such keys at some later time, and eager deletion, where we remove the key from the table immediately. As just discussed, the code put(key, null) is an easy (lazy) implementation of delete(key). When we give an (eager) implementation of delete(), it is intended to replace this default.
  • Iterators. The keys() method returns an Iterable<Key> object for clients to use to iterate through the keys.
  • Key equality. Java requires that all objects implement an equals() method and provides implementations both for standard types such as Integer, Double, and String and for more complicated types such as Date, File and URL. For applications involving these types of data, you can just use the built-in implementation. For example, if x and y are String values, then x.equals(y) is true if and only if x and y have the same length and are identical in each character position. In practice, keys might be more complicated, as in Person.java. For such client-defined keys, you need to override equals(). Java's convention is that equals() must implement an equivalence relation:
    • Reflexive: x.equals(x) is true.
    • Symmetric: x.equals(y) is true if and only if y.equals(x) is true.
    • Transitive: if x.equals(y) and y.equals(z) are true, then so is x.equals(z).

    In addition, equals() must take an Object as argument and satisfy the following properties:

    • Consistency: multiple invocations of x.equals(y) consistently return the same value, provided neither object is modified
    • Not null: x.equals(null) returns false.

    A best practice is to make Key types immutable, because consistency cannot otherwise be guaranteed.

§ Ordered symbol tables.

In typical applications, keys are Comparable objects, so the option exists of using the code a.compareTo(b) to compare two keys a and b. Several symbol-table implementations take advantage of order among the keys that is implied by Comparable to provide efficient implementations of the put() and get() operations. More important, in such implementations, we can think of the symbol table as keeping the keys in order and consider a significantly expanded API that defines numerous natural and useful operations involving relative key order. For applications where keys are Comparable, we implement the following API:

Ordered Symbol-table API

Examples of ordered symbol table operations
  • Minimum and maximum. Perhaps the most natural queries for a set of ordered keys are to ask for the smallest and largest keys. We have already encountered the need for these operations (in our discussion of priority queues in Section 3.4).
  • Floor and ceiling. Given a key, it is often useful to be able to perform the floor operation (find the largest key that is less than or equal to the given key) and the ceiling operation (find the smallest key that is greater than or equal to the given key). The nomenclature comes from functions defined on real numbers (the floor of a real number x is the largest integer that is smaller than or equal to x and the ceiling of a real number x is the smallest integer that is greater than or equal to x).
  • Rank and selection. The basic operations for determining where a new key fits in the order are the rank operation (find the number of keys less than a given key) and the select operation (find the key with a given rank). We also have already encountered the need for these operations, in our discussion of sort applications in Section 2.5.
  • Range queries. How many keys fall within a given range? Which keys fall in a given range? The two-argument size() and keys() methods that answer these questions are useful in many applications, particularly in large databases.
  • Delete minimum and delete maximum. Our ordered symbol table API adds to the basic API methods to delete the maximum and minimum keys (and their associated values).
  • Exceptional cases. When a method is to return a key and there is no key fitting the description in the table, our convention is to throw an exception.
  • Key equality (revisited). The best practice in Java is to make compareTo() consistent with equals() in all Comparable types. That is, for every pair of objects a and b in any give Comparable type, it should be the case that (a.compareTo(b) == 0) and a.equals(b) have the same value.

§ Sample clients.

We consider two clients: a test client that we use to trace algorithm behavior on small inputs and a performance client.

  • Test client. The main() client in each of our symbol table implementations reads in a sequence of strings from standard input, builds a symbol table by associating the value i with the ith key in the input, and then prints the table.
  • Frequency counter. Program FrequencyCounter.java is a symbol-table client that finds the number of occurrences of each string (having at least as many characters as a given threshold length) in a sequence of strings from standard input, then iterates through the keys to find the one that occurs the most frequently.

§ Sequential search in an unordered linked list.

Program SequentialSearchST.java implements a symbol table with a linked list of nodes that contain keys and values. To implement get(), we scan through the list, using equals() to compare the search key with the key in each node in the list. If we find the match, we return the associated value; if not, we return null. To implement put(), we also scan through the list, using equals() to compare the client key with the key in each node in the list. If we find the match, we update the value associated with that key to be the value given in the second argument; if not, we create a new node with the given key and value and insert it at the beginning of the list. This method is known as sequential search.

Sequential search

 Proposition A.

Unsuccessful search and insert in an (unordered) linked-list symbol table both use N compares, and successful search uses N compares in the worst case. In particular, inserting N keys into an initially empty linked-list symbol table uses ~N^2/2 compares.

§ Binary search in an ordered array.

Program BinarySearchST.java implements the ordered symbol table API. The underlying data structure is two parallel array, with the keys kept in order. The heart of the implementation is the rank() method, which returns the number of keys smaller than a given key. For get(), the rank tells us precisely where the key is to be found if it is in the table (and, if it is not there, that it is not in the table). For put(), the rank tells us precisely where to update the value when the key is in the table, and precisely where to put the key when the key is not in the table. We move all larger keys over one position to make room (working from back to front) and insert the given key and value into the proper positions in their respective arrays.

Binary search symbol table

  • Binary search. The reason that we keep keys in an ordered array is so that we can use array indexing to dramatically reduce the number of compares required for each search, using a venerable classic algorithm known as binary search. The basic idea is simple: we maintain indices into the sorted key array that delimit the subarray that might contain the search key. To search, we compare the search key against the key in the middle of the subarray. If the search key is less than the key in the middle, we search in the left half of the subarray; if the search key is greater than the key in the middle we search in the right half of the subarray; otherwise the key in the middle is equal to the search key.

    Rank

  • Other operations. Since the keys are kept in an ordered array, most of the order-based operations are compact and straightforward.

 Proposition B.

Binary search in an ordered array with N keys uses no more than lg N + 1 compares for a search (successful or unsuccessful) in the worst case.

 Proposition C.

Inserting a new key into an ordered array uses ~ 2N array accesses in the worst case, so inserting N keys into an initially empty table uses ~ N^2 array accesses in the worst case.